On-line Techniques to Adjust and Optimize Checkpointing Frequency
نویسندگان
چکیده
Due to increased susceptibility to soft errors in recent semiconductor technologies, techniques for detecting and recovering from errors are required. Roll-back Recovery with Checkpointing (RRC) is one well known technique that copes with soft errors by taking and storing checkpoints during execution of a job. Employing this technique, increases the average execution time (AET), i.e. the expected time for a job to complete, and thus impacts performance. To minimize the AET, the checkpointing frequency is to be optimized. However, it has been shown that optimal checkpointing frequency depends highly on error probability. Since error probability cannot be known in advance and can change during time, the optimal checkpointing frequency cannot be known at design time. In this paper we present techniques that are adjusting the checkpointing frequency on-line (during operation) with the goal to reduce the AET of a job. A set of experiments have been performed to demonstrate the benefits of the proposed techniques. The results have shown that these techniques adjust the checkpointing frequency so well that the resulting AET is close to the theoretical optimum.
منابع مشابه
Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid
Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...
متن کاملCompiler-Assisted Checkpointing
In this paper we present compiler-assisted checkpointing, a new technique which uses static program analysis to optimize the performance of checkpointing. We achieve this performance gain using libckpt, a checkpointing library which implements memory exclusion in the context of user-directed checkpointing. The correctness of user-directed checkpointing is dependent on program analysis and inser...
متن کاملAn Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment
Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...
متن کاملArchitecture Support for Behavior-based Adaptive Checkpointing
Checkpointing is a commonly used approach to provide system fault-tolerance. However, using a constant checkpointing frequency may compromise the system’s overall performance when there are multiple types of QoS requirements involved. Hence, it is important that the checkpointing frequency is customizable and runtime adaptable. However, for open distributed and embedded applications, often ther...
متن کاملVM-μCheckpoint: Design, Modeling, and Assessment of Lightweight In-Memory VM Checkpointing
Checkpointing and rollback techniques enhance reliability and availability of virtual machines and their hosted IT services. This paper proposes VM-μCheckpoint, a light-weight pure-software mechanism for high-frequency checkpointing and rapid recovery for VMs. Compared with existing techniques of VM checkpointing, VM-μCheckpoint tries to minimize checkpoint overhead and speed up recovery by mea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009